Contextually-Based Data-Derived Pronunciation Networks for Automatic Speech Recognition
نویسنده
چکیده
The context in which a phoneme occurs leads to consistent differences in how it is pronounced. Phonologists employ a variety of contextual descriptors, based on factors such as stress and syllable boundaries, to explain phonological variation. However, in developing pronunciation networks for speech recognition systems, little explicit use is made of context other than the use of whole word models and use of triphone models. This paper describes the creation of pronunciation networks using a wide variety of contextual factors which allow bet ter prediction of pronunciation variation. We use a phoneme level representation which permits easy addition of new words to the vocabulary, with a flexible context representation which allows modeling of long-range effects, extending over syllables and across word-boundaries. In order to incorporate a wide variety of factors in the creation of pronunciation networks, we used data-derived context trees, which possess properties useful for pronunciation network creation.
منابع مشابه
Automatic rule-based generation of word pronunciation networks
In this paper a method for generating word pronunciation networks for speech recognition is proposed. The networks incorporate different acceptable pronunciation variants for each word. These variants are determined by applying pronunciation rules to the standard pronunciation of the words. Instead of a manual search, an automatic learning procedure is used to compose a sensible set of rules. T...
متن کاملTowards Automatic Mispronunciation Detection in Singing
A tool for automatic pronunciation evaluation of singing is desirable for those learning a second language. However, efforts to obtain pronunciation rules for such a tool have been hindered by a lack of data; while many spokenword datasets exist that can be used in developing the tool, there are relatively few sung-lyrics datasets for such a purpose. In this paper, we demonstrate a proof-of-pri...
متن کاملStatistical modeling of pronunciation and production variations for speech recognition
In this paper, we propose a procedure for training a pronunciation network with criteria consistent with the optimality objectives for speech recognition systems. In particular, we describe a framework for using maximum likelihood(ML) and minimum classi cation error(MCE) criteria for pronunciation network optimization. The ML criterion is used to obtain an optimal structure for the pronunciatio...
متن کاملAutomatic generation of multiple pronunciations based on neural networks
We propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (alternative pronunciations) from the canonical pronunciation. This method can generate multiple forms of alternative pronunciations using the pronunciation network. For generating a sophisticated alternative pronunciation dictionary, two ...
متن کاملData Driven Approaches to Phonetic Transcription with Integration of Automatic Speech Recognition and Grapheme-to-Phoneme for Spoken Buddhist Sutra
We propose a new approach for performing phonetic transcription of text that utilizes automatic speech recognition (ASR) to help traditional grapheme-to-phoneme (G2P) techniques. This approach was applied to transcribe Chinese text into Taiwanese phonetic symbols. By augmenting the text with speech and using automatic speech recognition with a sausage searching net constructed from multiple pro...
متن کامل